AITopics | optimal convergence rate

Collaborating Authors

optimal convergence rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality

Neural Information Processing SystemsJun-19-2026, 09:56:07 GMT

In this work, we study offline convex optimization with smooth objectives, where the classical Nesterov's Accelerated Gradient (NAG) method achieves the optimal accelerated convergence. Extensive research has aimed to understand NAG from various perspectives, and a recent line of work approaches this from the viewpoint of online learning and online-to-batch conversion, emphasizing the role of optimistic online algorithms for acceleration. In this work, we contribute to this perspective by proposing novel optimistic online-to-batch conversions that incorporate optimism theoretically into the analysis, thereby significantly simplifying the online algorithm design while preserving the optimal convergence rates. Specifically, we demonstrate the effectiveness of our conversions through the following results: (i) when combined with simple online gradient descent, our optimistic conversion achieves the optimal accelerated convergence; (ii) our conversion also applies to strongly convex objectives, and by leveraging both optimistic online-to-batch conversion and optimistic online algorithms, we achieve the optimal accelerated convergence rate for strongly convex and smooth objectives, for the first time through the lens of online-to-batch conversion; (iii) our optimistic conversion can achieve universality to smoothness -- applicable to both smooth and non-smooth objectives without requiring knowledge of the smoothness coefficient -- and remains efficient as non-universal methods by using only one gradient query in each iteration. Finally, we highlight the effectiveness of our optimistic online-to-batch conversions by a precise correspondence with NAG.

artificial intelligence, ext, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Neural Information Processing SystemsApr-28-2026, 07:13:04 GMT

Decentralized optimization is effective to save communication in large-scale machine learning. Although numerous algorithms have been proposed with theoretical guarantees and empirical successes, the performance limits in decentralized optimization, especially the influence of network topology and its associated weight matrix on the optimal convergence rate, have not been fully understood. While Lu and Sa [44] have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear. This paper revisits non-convex stochastic decentralized optimization and establishes an optimal convergence rate with general weight matrices. In addition, we also establish the optimal rate when non-convex loss functions further satisfy the PolyakLojasiewicz (PL) condition. Following existing lines of analysis in literature cannot achieve these results. Instead, we leverage the Ring-Lattice graph to admit general weight matrices while maintaining the optimal relation between the graph diameter and weight matrix connectivity. Lastly, we develop a new decentralized algorithm to nearly attain the above two optimal rates under additional mild conditions.

artificial intelligence, machine learning, optimization, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

Neural Information Processing SystemsMar-19-2026, 05:11:23 GMT

This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $\mathcal{O}(\log T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, we investigate the non-convex finite-sum problem and develop another innovative adaptive variance reduction method that achieves an optimal convergence rate of $\mathcal{O}(n^{1/4} T^{-1/2})$, where $n$ represents the number of component functions.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

A Fast and Accurate Estimator for Large Scale Linear Model via Data Averaging

Neural Information Processing SystemsFeb-13-2026, 20:20:55 GMT

The asymptotic behavior of the proposed estimation procedure is studied. Our theoretical results show that the proposed method can achieve a faster convergence rate than the optimal convergence rate for sampling methods.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman, Francis Bach, Sebastien Bubeck, Laurent Massoulié, Yin Tat Lee

Neural Information Processing SystemsFeb-13-2026, 17:35:41 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, convergence rate, optimization, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Optimal Stochastic and Online Learning with Individual Iterates

Yunwen Lei, Peng Yang, Ke Tang, Ding-Xuan Zhou

Neural Information Processing SystemsFeb-11-2026, 20:47:49 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, algorithm 1, iterate, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)
(3 more...)

Industry: Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions Wei Jiang 1, Sifan Y ang

Neural Information Processing SystemsFeb-9-2026, 18:17:05 GMT

Problem (1) has been comprehensively investigated in the literature [Duchi et al., 2011, Kingma and Ba, 2015, Loshchilov and Hutter, 2017], and it is well-known that the classical stochastic gradient descent (SGD) achieves a convergence rate of

artificial intelligence, convergence rate, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Neural Information Processing SystemsDec-27-2025, 04:39:04 GMT

Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning.

exact policy mirror descent, optimal convergence rate, policy mirror descent, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

A Fast and Accurate Estimator for Large Scale Linear Model via Data Averaging

Neural Information Processing SystemsDec-25-2025, 22:57:52 GMT

This work is concerned with the estimation problem of linear model when thesample size is extremely large and the data dimension can vary with the samplesize. In this setting, the least square estimator based on the full data is not feasiblewith limited computational resources. Many existing methods for this problem arebased on the sketching technique which uses the sketched data to perform leastsquare estimation. We derive fine-grained lower bounds of the conditional meansquared error for sketching methods.

fast and accurate estimator, name change, scale linear model, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Filters

Collaborating Authors

optimal convergence rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

A Fast and Accurate Estimator for Large Scale Linear Model via Data Averaging

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Optimal Stochastic and Online Learning with Individual Iterates

cb8acb1dc9821bf74e6ca9068032d623-Paper.pdf

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions Wei Jiang 1, Sifan Y ang

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

A Fast and Accurate Estimator for Large Scale Linear Model via Data Averaging